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Beamforming techniques that employ Singular Value Decomposition (SVD) are commonly used in Multi-Input 
Multi-Output (MIMO) wireless communication systems. In the absence of channel coding, when a single symbol is 



are simultaneously transmitted, this property is lost. When channel coding is employed, full diversity order can 
' be achieved. For example, when Bit-Interleaved Coded Modulation (BICM) is combined with this technique, full 

diversity order of NM in an M X TV MIMO channel transmitting S parallel streams is possible, provided a 
condition on S and the BICM convolutional code rate is satisfied. In this paper, we present constellation precoded 
O , multiple beamforming which can achieve the full diversity order both with BICM-coded and uncoded SVD systems. 

We provide an analytical proof of this property. To reduce the computational complexity of Maximum Likelihood 
^ \ (ML) decoding in this system, we employ Sphere Decoding (SD). We report an SD technique that reduces the 

■ computational complexity beyond commonly used approaches to SD. This technique achieves several orders of 

\ magnitude reduction in computational complexity not only with respect to conventional ML decoding but also. 



with respect to conventional SD. 

Index Terms 

MIMO systems, SVD, BICMB, constellation precoding, sphere decoding. 

I. Introduction 

When the perfect channel state information is available at the transmitter, beamforming is employed 
to achieve spatial multiplexing and thereby increase the data rate, or to enhance the performance of a 
Multiple-Input Multiple-Output (MIMO) system [IJ. The beamforming vectors are designed in ||2l, ||3l 
for various design criteria, and can be obtained by the Singular Value Decomposition (SVD), leading to 
a channel-diagonalizing structure optimum in minimizing the average Bit Error Rate (BER) [[31 . Uncoded 
Single Beamforming (SB), which carries only one symbol at a time, was shown to achieve the full 
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diversity order of NM where and M are the number of transmit and receive antennas, respectively 
BH, [|5]|. However, the diversity order of uncoded multiple beamforming, which increases the throughput 
by sending multiple symbols at a time, is {N — S + 1)(M — 5+1) where the symbols are transmitted on 
the subchannels with the largest S singular values, losing the full diversity order over flat fading channel 

It is known that an SVD subchannel with larger singular value provides larger diversity gain Under 
the simultaneous parallel transmission of the symbols on the diagonalized subchannels, the performance 
at high Signal-to-Noise Ratio (SNR) is dominated by the subchannel with the smallest singular value. To 
overcome the degradation of the diversity order of multiple beamforming, Bit-Interleaved Coded Multiple 
Beamforming (BICMB) was proposed [[61, [|71. This scheme interleaves the codewords through the multiple 
subchannels with different diversity order, resulting in better diversity order. BICMB can achieve the full 
diversity order offered by the channel as long as the code rate and the number of employed subchannels 
5" satisfy the condition R^S < 1 B. 

In this paper, we present a multiple beamforming technique that achieves the full diversity order in both 
of the coded and the uncoded systems. This technique employs the constellation precoding scheme [0, 
[fTOl . [[TTI . [fT2l . [[T3l . which is used for space-time or space-frequency block codes to increase the system 
data rate without losing the full diversity order. We show via Pairwise Error Probability (PEP) analysis 
that Fully Precoded Multiple Beamforming (FPMB) with Maximum Likelihood (ML) detection achieves 
the full diversity order even in the absence of any channel coding. We also present the diversity analysis 
of Bit-Interleaved Coded Multiple Beamforming with Constellation Precoding (BICMB-CP), which adds 
the constellation precoding stage to BICMB. We show that the addition of the constellation precoder to 
BICMB, whose code rate Rc is larger than 1/ S, provides the full diversity when the subchannels for the 
precoded symbols are properly chosen. Simulation results are shown to prove the analysis. 

Multiple beamforming without constellation precoding separates the MIMO channel into independent 
parallel subchannels, enabling symbol-by-symbol detection on each subchannel. Since the precoder at the 
transmitter no longer allows the parallel independent detection of the symbols on each subchannel, the 
complexity of the ML detection for precoded symbols, which provides optimal performance, increases 
exponentially with the number of possible constellation points of the modulation scheme and the dimension 
of the constellation precoder. The complexity increase makes the receiver with the ML detection unsuitable 
for practical purposes [fT4l . On the other hand. Sphere Decoding (SD) was proposed as an alternative for 
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ML detection that provides optimal performance with reduced computational complexity IfTSl . 

Several complexity reduction techniques for SD have been proposed. In [fT6l and [fTTll . attention is 
drawn to the initial radius selection strategy, since an inappropriate initial radius can result in either a 
large number of lattice points to be searched, or a number of restarted searches with increased initial 
radius. In and |fT9l , the complexity is reduced by making a proper choice to update the sphere radius. 
Other methods, such as the K-best lattice decoder EOl . [|2T1l . and a combination of SD and K-hest decoder 
II22I . can significantly reduce the complexity of low SNR at the cost of BER performance. 

In this paper, we propose an SD algorithm which efficiently improves the complexity of constellation 
precoded multiple beamforming over flat fading channel by reducing the average number of multiplications 
required to obtain the optimal solution. This complexity reduction is accomplished by precalculating the 
multiplications at the beginning of decoding, and recycling them later for the repetitive calculations. The 
reduction is achieved further by the help of the lattice representation of our previous work presented in 
l|23l , which introduces orthogonality between the real and imaginary parts of every detected symbol. Based 
on Zero-Forcing Decision Feedback Equalization (ZF-DFE), the proposed SD algorithm includes a method 
to determine the initial radius, reducing the average number of real multiplications needed to acquire one 
precoded bit metric for BICMB-CR With simulation results, we show that conventional SD reduces the 
complexity substantially compared with the exhaustive search, and the complexity can be further reduced 
effectively by our proposed SD. The complexity reduction becomes larger as the constellation precoder 
dimension and the constellation size become larger. 

The rest of this paper is organized as follows. The description of uncoded and coded multiple beamform- 
ing combined with constellation precoding is given in Section UIl Sections UlI] and |IV] present the diversity 
analysis of the MIMO schemes through the calculation of the upper bound to PER The computational 
complexity reduction sphere detection algorithm is discussed in Section |Vl Simulation results supporting 
the analysis are shown in Section |VIl Finally, we end the paper with our conclusion in Section IVIII 

Notation: Bold lower (upper) case letters denote vectors (matrices). diag[Bi, ■ ■ ■ ,Bp] stands for a 
block diagonal matrix with matrices Bi, ■ ■ ■ , Bp, and diag[6i, ■ ■ ■ , 6p] is a diagonal matrix with diagonal 
entries 61, ■ ■ ■ , 6p. 3?(-) and 53(-) denote the real and imaginary part of a complex number, respectively. 
The superscripts (■)^, (■)^, (■)*, (•) stand for conjugate transpose, transpose, complex conjugate, binary 
complement, respectively, and V denotes for-all. [•] is the ceiling function that maps a real number to 
the next largest integer. IR+ and C stand for the set of positive real numbers and the complex numbers. 
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respectively, rfmm is the minimum Euclidean distance between two points in a constellation. 

II. System Model 

A. Uncoded Multiple Beamforming with Constellation Precoding 

Uncoded Multiple Beamforming with Constellation Precoding (UMB-CP) transforms modulated sym- 
bols to precoded symbols via a precoding matrix. The 5x1 symbol vector x, where S < min(A^, M), is 
precoded by a square matrix 0. We assume that the elements of x belong to a signal set % C C of size 
Ixl = 2*", such as 2™-QAM, where m is the number of input bits to the Gray encoder. The precoder is 
expressed as 









Is-P 



(1) 



where is a P x P constellation precoding matrix that precodes the first P modulated symbols of the 
vector X. When all of the S modulated symbols are precoded (P = S), we call the resulting system Fully 
Precoded Multiple Beamforming (FPMB), otherwise, we call it Partially Precoded Multiple Beamforming 
(PPMB). The permutation matrix T reorders the precoded P symbols and non-precoded S — P symbols to 
be transmitted on the predefined subchannels created by the SVD of the MIMO channels. Let us define ry = 
[rji ■■■ r^p] as a vector whose element rjp is the index of the subchannel on which the precoded symbols are 
transmitted, and ordered increasingly such that rjp < rj^ for p < q.ln the same way, u = \uji ■ ■ ■ uj{^s_p)\ 
is defined as an increasingly ordered vector whose elements are the indices of the subchannels which 
carry the non-precoded symbols. 

The MIMO channel H G C^^^^ is assumed to be quasi-static, Rayleigh, and flat fading, and perfectly 
known to both the transmitter and the receiver. The beamforming matrices are determined by the SVD 
of the MIMO channel, i.e., H = UAV^ where U and V are unitary matrices, and A is a diagonal 
matrix whose s*^ diagonal element, G M+, is a singular value of H in decreasing order. When S 
symbols are transmitted at the same time, then the first S vectors of U and V are chosen to be used 



as beamforming matrices at the receiver and the transmitter, respectively. In Fig. |l(a)| which displays the 
structure of UMB-CP, U and V denote the beamforming matrices picked from U and V. 

The serial-to-parallel converter organizes the symbol vector x as x = [x^:x^]^ = [x^^ ■■■x^p: 
■ ■ ■ Xu,,c, where x^ and x^^ consist of the modulated entries to be transmitted on the subchannels 
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specified in 77 and uj, respectively. The S x 1 detected symbol vector y = [yj:y^]^ = [yi ■ ■■ yp': 
yp+i ■ ■ ■ ysV '^he receiver is written as 

y = rex + n (2) 

where T is a block diagonal matrix, T = diag[rp, r„], with diagonal matrices defined as Tp = diag[A^^, 
■ ■ ■ , Xrip], Tn = diag[A(^^, ■ ■ ■ , A^^^^ ^J, and n = [nj : n^]^ is an additive white Gaussian noise vector 
with zero mean and variance Nq = N/SNR. The matrix H is complex Gaussian with zero mean and 
unit variance, and to make the received signal-to-noise ratio SNR, the total transmitted power is scaled 
as A^. The input-output relation in ^ is decomposed into two equations as 



yp = TpQxrj + Hp 



(3) 



The ML decoding of the detected symbol x = [x^ : x^]^ = [x^^ ■ ■ ■ : x^-^ ■ ■ ■ x^^^^p^]"^ is given by 



1 1 2 

X = arg min ||y — r0x|| (4) 



where represents the 5'-dimensional product space based on x- For PPMB, the symbol can be detected 
in a parallel fashion as 



x^ = arg mm 



2 

Yp - TpQx 



(5) 



for the precoded symbol, and 



xi = arg min \yi — A[x| (6) 



for the non-precoded symbol where / is the corresponding index transformed by T. 

B. Bit-Interleaved Coded Multiple Beamforming with Constellation Precoding 

Fig. |l(b)| represents the structure of Bit-Interleaved Coded Multiple Beamforming with Constellation 
Precoding (BICMB-CP). First, the convolutional encoder with code rate Rc = kc/ric, possibly combined 
with a perforation matrix for a high rate punctured code, generates the codeword c from the information 
bits. Then, the spatial interleaver vr^ distributes the coded bits into S streams, each of which is interleaved 
by an independent bit-wise interleaver tt^. The interleaved bits are mapped by Gray encoding onto the 



symbol sequence X = [xi ■ • ■ x^], where x^. is an 5* x 1 symbol vector at the A;*'^ time instant. Each 
entry of x^ belongs to a signal set x- 

The symbol vector x^ is multiplied by the S x S precoder in ([T]). When all of the S modulated 
entries are precoded (P = S), we call the resulting system Bit-Interleaved Coded Multiple Beamforming 
with Full Precoding (BICMB-FP), otherwise, we call it Bit-Interleaved Coded Multiple Beamforming 
with Partial Precoding (BICMB-PP). The precoded symbol vector is transmitted on the MIMO channel 
described in Section III-AI 

As in UMB-CP, the spatial interleaver arranges the symbol vector x^ as Xk = [xf^^ix^^^]^ = [xk 



Xk,vp'-Xk,u;i ■ ■ ■ Xk,uj(s-P)V- The 5 X 1 detected symbol vector = [(r^)^ : (r^)^]^ = [rk, 



rk,p '■ rk,p+i ■ ■ ■ ''"k,s] at the k time instant is 

Tk = rexfc + rifc (7) 

where rife = [(n^)^ : (n^)^]^ is an additive white Gaussian noise vector. 

The location of the coded bit c^/ within the symbol sequence X is known as k' — > {k,l,i), where 
k, I, and i are the time instant in X, the symbol position in x^, and the bit position on the label Xk,i, 
respectively. Let xl denote a subset of x vvhose labels have 6 G {0, 1} in the i*^ bit position. By using the 
location information and the input-output relation in (|7]), the receiver calculates the maximum likelihood 
bit metrics for the coded bit Ck' as 



7'''(rfc,Cfc/) = min ||rfc - TOxf (8) 
where ^l:^^ is a subset of x"^, defined as 

^b' = {x = [xi ■ ■ ■ xsf : Xs\s=i e xl and Xs\s^i E x}- 

In particular, based on the decomposition of dV]) similar to Q and the bit metrics, equivalent to ^ 
for partial precoding, are 

mill II - TpQxf, if 1 < / < P 

V'irk,Ck') = l (9) 
min \rk,i - Xjx]"^, if P + 1 < I < S 



where ^f,'' is a subset of x^, defined as 



i'b' = {x = [a;i ■ ■ ■ xpf : Xs\s=i G xl and Xs\s=^i e x}, 

and / is an entry in oj, corresponding to the subchannel mapped by T. Finally, the ML decoder makes 
decisions according to the rule 



argmin V'7'''(rfc,Cfc/ 

c ' 



(10) 



III. Diversity Analysis : UMB-CP 

A. Fully Preceded Multiple Beamforming 

Based on the ML decoding in ([H), the upper bound to the instantaneous PEP between the transmitted 
symbol x and the detected symbol x is calculated as 



Pr (x ^ X I H) = Pr (||y - r0x||^ > ||y - r0x||^ - \ ^"^^^4]^r~^ 
Let d=[di ■■■ ds] = e(x - x). Then, for FPMB, the average PEP becomes 



(11) 



Prfx ^ x) < 



■ exp 



s=l 



\ 



J 



(12) 



In (Si, we showed that equations in the form of (fT2l) have a closed form upper bound expression. We 
provide a formal statement below. 

Theorem 1: Consider the S < min(A^, M) ordered eigenvalues ni > ■■■ > fis of the uncorrelated 
central Wishart matrij^ [|24|. and a weight vector = [0i ■ ■ ■ 0^]^ with nonnegative real elements. In 
the high signal-to-noise ratio regime, an upper bound for the expression _E'[exp(— 7 J2s=i 0s/Us)] which is 
used in the diversity analysis of a number of MIMO systems is 



E 



s=l 



■ , 05 }, and 5 is the index indicating 



where 7 is signal-to-noise ratio, ^ is a constant, (f)niin = min{0i, ■ ■ ■ 
the first nonzero element in the weight vector. 

'a central Wishart matrix is tiie Hermitian matrix AA^ where the entry of the matrix A is complex Gaussian with zero mean so that 
E[A] — 0. The Wishart matrix AA^ is called uncorrelated if the common covariance matrix, defined as C = E[a.sa.^]\/s, where is 
the s*'' column vector of A, satisfies C = I. 



Proof: See m. □ 
Applying Theorem [T] to (fT2l) . we get the upper bound to PEP as 

-(Af-<5+l){M-(5+l) 

AN 



Pr (x ^ x) < C| %^5iVi? I (13) 



where C is a constant, dmin = min{|(iip, ■ ■ ■ , 1^5^}, and 5 is an index indicating the first nonzero element 
of the vector \\di\^ ■ ■ ■ \ds\'^]. Therefore, FPMB achieves the full diversity order if 5 from any distinct 
pair is equal to 1, which implies that \di\^ = |0f (x — x)p > for any distinct pair, where 6\ is the first 
row vector of 0. Several methods to build the precoding matrix are described in [|25l and [|26l . 

B. Partially Precoded Multiple Beamforming 

Generalizing (fTTI) for PPMB, we get an upper bound to PEP as 



Pr (x ^ x) < E 

where 



1 f K 

2""^^ V 4iVn 



(14) 



S-P 



^i = ^\l^ds\^ + ^\l}x^^-x^f (15) 



s=l s=l 



and ds is the s element of a vector d = 0(x^ — x^j)- Let us assume that the constellation precoding 
matrix meets the condition of FPMB to achieve the full diversity order. Since the expression (fT4l) with 
(fT5l) has a closed form expression similar to (fT3l) as described in FPMB, the 5 value needs to be obtained 
from a composite vector with the elements as Irf^P and \x^^ — s^^J^, to observe the diversity behavior of 
a given pairwise error. In addition, a different pair can lead to different diversity behavior. Therefore, we 
need to get the maximum 5 out of all the possible pairwise errors to decide the diversity order of a given 
PPMB system. 

All of the distinct pairs of x and x are divided into three groups in terms of x^, x^, x^^, and x^^. 
The first group includes the pairs that have x^ = x^ but x^^ 7^ x^^, and the second group comprises the 
pairs satisfying x^ 7^ x^ but x^^ = x^^. Finally, the last group consists of the pairs for which x^ 7^ x^ and 
7^ Xt^- We will present the method to calculate the maximum 5 for each group, and to find 5max from 
the groups. 

Since the vector d is a zero vector for the first group, the first summation of k in ([T?] ) is zero, resulting 
in 5 being equal to the minimum of a;. By considering all of the possible pairs, we easily see that 
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uJi < 5 < uj(s~P)- Therefore, the maximum value is 5i = uJ(s-p) which corresponds to the pair satisfying 
Xs = Xs for all s except s = uJ(^s-p)- For any pair in the second group, the term with the first singular value 
survives in k, according to the inherited property of the constellation precoding matrix, i.e., \di\'^ > 0. 
However, the second summation in k disappears since x^^ = x^^. Therefore, the maximum value of this 
group is §2 = f]i- Now, for the third group, both summations in k exist. Then, S is chosen to be the smaller 
value between the minimum of oj and t]i. In the same manner as was already given in the analysis of 
the first group, the maximum of the minimum of is found to be uj(^s-p)- Therefore, the maximum 5 for 
this group is ^3 = maxjr/i, iU(^s-P)}- Finally, 6max is decided as 

Smax = maxj^i, 82, S3} = max (r^i, uJ(s-p)) ■ (16) 

Example: We provide the diversity analysis of the 4x4 PPMB system with 5 = 4 and P = 2. In this 
example, we assume that the precoded symbols are transmitted on the subchannel 1 and 3 while the non- 
precoded symbols are transmitted on the subchannel 2 and 4. Then, this configuration gives rj = [1 3], and 
a; = [2 4]. By following the result in (fT6l ). Smax is equal to max (1, 4) =4, leading to the diversity order 
of 1. The pairwise errors, satisfying xi = xi, X2 = X2,X3 = £3, but X4 7^ £4, inflict loss on the diversity 
order of this system. Table IJ summarizes the diversity order analysis for all of the possible combinations 
of the 4x4 PPMB system. We will provide simulation results that verify this analysis in Section |Vll 
specifically in Fig. IH 

IV. Diversity Analysis : BICMB-CP 

A. BICMB with Full Precoding 

We assume that the dn coded bits are interleaved such that they are placed in distinct symbols, where 
dn denotes the Hamming distance between the transmitted codeword c and the decoded codeword c. 
Since the bit metrics in ([8]) are the same for the same coded bits between the pairwise errors, the original 
PEP is replaced by 

Pr(c c|H) = Pr ( min ||rfc - TOxf > ^ min ||rfc - T&xf 1 (17) 

\k,dH ^^^"'k' k,dH ''^'^4' / 

where the summation is restricted to the symbols corresponding to the different d^ coded bits. 
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Let us define and as 



Xfc = arg min \\rk — r0x||^ 

rexip 



(18) 



Xfc = arg mm \\rk 



where Ck' is tlie complement of c^/ in binary codes. It is easily found that x^ is different from x^ since 
the sets that the symbols belong to are disjoint, as can be seen from the definition of In the same 
manner, we see that x^ is different from x^. With x^ and x^, we get, from (fTTl ), 



Pr f c ^ c 



H) = Pr f 5^ \\rk - rei,f > J2 11^^ - rex,fj 

\k,dH k,dH / 



(19) 



Based on the fact that Hr^ — rGx^p > ||rfe — rGx^p and the relation in (|7]), equation (fT9l ) is upper- 
bounded by 



Pr(c ^ c|H) < Pr [ /5 > ||re( 



(20) 



where 



/3 = - $^(xfe - xfe)^0^rn, + nf re(x, - X,). 

Since /9 is a zero mean Gaussian random variable with variance ^^Nq^j^^^ \\T&{xk — Xfc)p, the right 
hand side of (|20l) is replaced by the Q function as 



Pr(c ^ c|H) < Q 



\ 



E ||r0(xt-xt)|p\ 



kjdjj 



2Nn 



\ 



(21) 



The numerator in (|2TI) is rewritten as 

^||r0(x,-x,)f = X:a^EI^mI 



(22) 



k,di 



s=l k,df- 



where d^^s is the s entry of the vector = ©(x^ — x^). Using an upper bound to the Q function, we 
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calculate the average PEP as 



Prfc ^c) < E 



exp 



=1 k,dH 



V 



/J 



(23) 



According to Theorem [U we can evaluate the diversity order of a given system by calculating the 
weight vector whose s*'* element is J^kdn particular, if the constellation precoder is designed 

such that 



(24) 



where Oj^ is the first row vector of the precoding matrix 0, we see that Efe I'^'^.il^ ^ resulting in 
the full diversity order of NM. Therefore, (|24|) is a sufficient condition for the full diversity order of 
BICMB-FP 



B. BICMB with Partial Precoding 

The bit metrics in Q lead to the PEP calculation as 



Pr(c c|H) = Pr(ri > T2) 



(25) 



where 



Ti = ^ min ||r^ - TpQxf + ^ min \rk,i - Xjx 



k,d^ ^^^'^fc' 



T2 = min ||r^ — rp0x||^ + min \rk,i — A[a;|^ 

Ylikd^ ^ Tlikd'^ stand for the summation over the c?^ and d% bit metrics, with (f^ and cCjj denoting 
the number of different coded bits between the two pairwise errors residing on the precoded and the non- 
precoded subchannels specified by r) and uj, respectively. By using the appropriate system input-output 
relations, the PEP is written as 



Pr(c ^ c|H) = Vx[(5>k 



(26) 
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where p = pp + pn. 



/3n = - ^ >^liXk,l - Xk^lYUk^i + \l{Xk,l - Xk,i)nl i, 



and 



= X] II Tp® {xk,n - Xfe,^) f + l\(a;fc, 



|2 



Since /3 in (|26l ) is a Gaussian random variable with zero mean and variance ^Nqk, the PEP can be 
expressed in a way similar to (|2TI) with the Q-function. In addition, if we define a as 

p S-P 

^ = 5Z 5Z '^^=''•1^ + ^"^i" 5Z -^^r""^-- (27) 

where dk^r is the r^^ entry of the vector = © (x^.tj — x^.r,), and is the number of times the s^^ 
subchannel is used corresponding to bits under consideration, then we can see that a < k. Finally, 
the average PEP is calculated as 



Pr(c ^ c) < E 



(28) 



To determine the diversity order from cr, we need to find the index indicating the first nonzero element 
in an ordered composite vector which consists of J2kd'' Mfc,r P in Theorem [T] If rf^ = 0, the 

first summation part of o vanishes. In this case, the first index is 

b = min{s : > for s G {uji, ■ ■ ■ , uJ(^s-p)}}- (29) 

In the other case of (i^ > 0, we see that x^^ and x^^ are obviously different for the same reason as in 
the previous section. If the constellation precoder satisfies the sufficient condition of (|24l) . the term with 
A^^ always exists in a. By considering the second term of a, we get 6 for the case of > 

min(?7i,5') if 5' exists, 
6={ ^ ' ^ (30) 

rji otherwise. 
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where 5' , if it exists, is obtained in the same way as (|29l ). If, in search of 5' , no s satisfying the right 
hand side of (|29l ) exists, we state 5' does not exist and set 5 = r/i, as in (|30l ). 

Example: In this example, we employ 4-state 1/2-rate convolutional code with generator polynomials 
(5, 7) in octal representation, inanA^ = M = S' = 3 system. Two types of spatial interleavers are used 
to demonstrate the different results of the diversity order. A generalized transfer function of BICMB with 
the specific spatial interleaver and convolutional code provides the ^-vectors for all of the pairwise errors, 
whose element indicates the number of times the stream is used for the erroneous bits [8|. In particular, 
due to the fact that (f^ = J2r=i ^vr ^^'^ = J2r=i '^•^r where is the s*^ element of the a-vector, 
the generalized transfer function approach in flSl is also useful in the analysis of BICMB-PP. Hence, we 
rewrite the transfer functions of the systems from |[8l, where a, 6, and c are the symbolic representation 
of the 1***, 2"'^, 3'''^ streams, respectively. The spatial interleaver used in 7i is a simple rotating switch on 
3 streams. For 7^, the u*^ coded bit is interleaved into the stream Sinod(u-i,i8)+i where Si = ■ ■ ■ = Sq = 
1, Sj = ■ ■ ■ = Si2 = 2, Si3 = ■ ■ ■ = Si8 = 3 and mod is the modulo operation. Each term represents an 
a-vector, and the powers of a, 6, c in this term indicate the elements of the a-vector corresponding to 
that term. 

Ti = Z\a%\ + + ah^c^) + Z\a%\ + a^^'c + + ah'^ + a'^hc^ + ah^c^) 

+ Z\2a^h^c + 2a%'^c^ + 2a%^c^ + 2a%c^ + 2a%^c^ + 2ah^c^) (31) 
+ Z\a'>h^ + a%\ + a%^c + 2a%\^ + ?>a%''c^ + 2a%^^ + a^6c=^ + ?>a%\''+ 
3a%^c^ + ab^c^ + b^c^ + a^bc^ + 2a%^c^ + ab^c^ + a^J') + ■■■ 

+ Z\a%'' + 3a363 + + a^c^ + ?>a%''^ + b^c^ + ?>a\'' + ?>b^^ + a^c^ + fe^c^) (32) 
+ Z\2a%^ + 2a^6^ + a%^c + Ta^fe^c^ + la^b^c^ + 2a^c^ + a=^6c^ + 7a%'^c^+ 
ab^c^ + 2b^c^ + 2a^c^ + 2b^c^) + ■■■ 

Consider the case rj = [12]. We see that all of the a-vectors of Ti have > 0. Since r]i = 1, 6 equals 
1 whether 5' exists or not. In fact, 5' does not exist for the term Z^a^b^. Therefore, the Ti BICMB-PP 
system with 77 = [12] achieves the full diversity order while BICMB without constellation precoding 
ttH, or PPMB without Bit-Interleaved Coded Modulation (BICM) loses the full diversity order [|25]|. [|26]|. 
For T2, the a-vector [0 5] gives c?^ = 0, resulting in 5 = 3. Therefore, the T2 BICMB-PP system with 
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T7 = [1 2] does not achieve the full diversity order. 

The same analysis for t; = [1 3] results in the diversity order of 9, and [2 3] results in 4 for the transfer 
function Ti. Similarly, both of [13] and [2 3] result in the diversity of 4 for 72. As a consequence, we find 
that proper selection of the subchannels for precoding, as well as the appropriate pattern of the spatial 
interleaver, is important to achieve the full diversity order of BICMB-PP. We will present simulation 
results that verify this analysis in Section |VIl in particular, in Fig. [8l 

V. Reduced Computational Complexity Sphere Detection 

In this section, we will describe the reduced computational complexity sphere detection for constellation 
precoded multiple beamforming with square QAM modulation. More specifically, we propose the sphere 
detection technique to reduce the number of multiplications without losing the performance. Since detect- 
ing the transmitted non-precoded symbols for UMB-CP in ^ and finding the bit metrics of non-precoded 
symbols for BICMB-CP in ^ can be carried out independently of the symbols on the other subchannels, 
we focus on the precoded P symbols. 

Solving ^ for the ML detection is well-known to be NP-hard, given that a full search over the entire 
lattice space is performed ||27ll . SD, on the other hand, solves ([5]) by searching only lattice points that lie 
inside a sphere of radius p centering around the received vector jp. A frequently used solution for the 
QAM-modulated complex signal model is to decompose the P-dimensional complex- valued problem ([5]) 
into a 2P-dimensional real-valued problem, which is written as 
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Fx + n 



5R{F} 


-$5{F}" 




3f?{x^} 












+ 




3?{F} 











(33) 



where F = Fp© [fTSl , [|27l . The QR decomposition of the 2P x 2P real- valued channel matrix turns ^ 
into the equivalent expression 



c„ = argmin llQ^'^y — Rxl 



(34) 



where Q and R are the unitary matrix and the upper triangular matrix from the QR decomposition of F 
[fTSl . [|27l . Let VL denote the set of scalar symbols for one dimension of QAM, e.g., Vt = {—3,— 1,1,3} 
for 16-QAM, then ^ denotes a subset of Vl^^ whose elements satisfy ||Q^y — Rxp < p^. The initial 
radius p should be chosen properly so that it is neither too small nor too large. Too small an initial radius 
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can result in too many unsuccessful searches by restarting the search and thus increasing the complexity, 
while too large an initial radius can result in too many lattice points to be searched. 

The SD algorithm can be viewed as a pruning algorithm on a tree of depth 2P, whose branches 
correspond to elements drawn from the set ^7 ||23l . [|27|| . Conventional SD implements a Depth-First 
Search (DFS) strategy in the tree which achieves ML performance. The complexity of SD is measured 
in terms of the number of operations required per visited node multiplied by the number of visited nodes 
throughout the search algorithm [|27il . The complexity can be reduced by either reducing the number of 
nodes to be visited or the number of operations to be carried out at each node or both. In order to reduce 
the number of visited nodes, one can either make a judicious choice of the initial radius to start the 
algorithm, or execute a proper sphere radius update strategy. The former strategy has been studied in 
lfT6l and ifTTl . and the latter one has been discussed in [fTSl and lfT9ll . In this paper, we propose methods 
to reduce the average number of real multiplications, which are the most expensive operations in terms 
of machine cycles required at each node for conventional SD. A proper choice of the initial radius for 
BICMB-CP will also be provided. 

We start by writing the node weight as [|23l 

W7(x(")) = t/7(x("+i)) + u;p^(x(")) (35) 

with M = 2P, 2P - 1, ■ ■ ■ , 1, ty(x(2^'+i)) = 0, and Wp^„(x(2^+^)) = 0, where x(") denotes the partial 
vector symbol at layer u. The partial weight w{yS^^) is written as 

2P 

Wpu-lx^"-") = \yu- "^Ru^vXvl^ (36) 

v=u 

where jju is the m*^ element of Q^y, Ru^v is the (M,f)*^ element of R, and is the f*'' element of x. 

A. Precalculation of Multiplications 

Note that for one channel realization, both R and Vt are independent of time. In other words, to decode 
different received symbols for one channel realization, the only term in (|36l ) which depends on time is 
yu- Consequently, a table T can be constructed to store all terms of Ru,vX, where Ru,v 7^ and x G fi, 
before starting the tree search procedure. Equations (|35] ) and (|36l ) imply that only one real multiplication 
is needed by using T instead of 2P — m + 2 for each node to calculate the node weight. As a result, the 
number of real multiplications can be significantly reduced. 



16 



Taking the square QAM structure into consideration, can be divided into two smaller sets fix with 
negative elements and with positive elements. Take 16-QAM for example, Q = {—3, —1, 1,3}, then 
rii = {—3,-1} and ^2 = {I53}. Any negative element in fii has a positive element with the same 
absolute value in i72- Consequently, in order to build T, only terms of Ru,vX, where Ru^v 7^ and x G fii, 
need to be calculated and stored. Hence, the size of T is 



ITI 



(37) 



where Nf( denotes the number of nonzero elements in matrix R, and denotes the size of 

In order to build T, both the number of terms that need to be stored and the number of real multiplications 
required are |T|. Since the channel is assumed to be flat fading, only one T needs to be built in one burst. 
If the burst length is very long, the computational complexity of building T can be neglected. 

B. Modified DFS algorithm 

The representation proposed in [|23l replaces the conventional representation of (l33l) with 



y = Gx + n 



(38) 



where 
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The structure of the lattice representation becomes advantageous after applying the QR decomposition 
to G, i.e., G = QR. Due to a special form of orthogonality between each pair of columns, all elements 
Ru,u+i for M = 1, 3, ■ ■ ■ , 2P — 1, in the upper triangular matrix R become zero ll23l . The locations 
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of these zeros introduce orthogonality between the real and imaginary parts of every detected symbol, 
which can be taken advantage of to reduce the computational complexity of SD. We provide the following 
example to explain this. 

Consider a 2 x 2 5 = 2 FPMB system employing 4-QAM. Then, SD constructs a tree with 2P = 4 
levels, where the branches coming out of each node represent the real values in the set ^7 = { — 1, 1}. This 
tree is shown in Fig. [2l Based on the representation in (|38l) . the input-output relation is given by 
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(39) 



where fiu are the u element of the vectors Q y, x, Q n, respectively, and Ru^v is the element 

of R. 

Calculating partial node weights of (|39| ) for the first level and the second level are independent, same 
as the third level and the fourth level, because of the additional zeros in the R matrix. For instance, the 
partial weights of node A and B in Fig. [2] depend on only X3, and the partial weights of node C, D, 
E, and F depend on X4, X3, and xi except $2. In other words, the partial weights of node A and B are 
equal, and need to be calculated once. Similarly, partial weights of node C and D can be used without 
an additional computation for the partial weights of node E and F, respectively. 

Because of this feature, the DFS strategy is modified in the following way: for the m*'* layer, where 
u is an odd number, partial weights of the nodes at the layer u (called children nodes) belonging to a 
node at the layer u + 1 (called a parent node) are stored, and are used as partial weights of the nodes 
belonging to the same node at the layer u + 2 (called a grandparent node), but to the different parent 
nodes. In other words, the weights of children nodes belonging to one of the parent nodes are recycled 
by the children's cousins. 

By implementing the modified DFS algorithm, further complexity reduction is achieved beyond the 
reduction due to the precalculation table T. We will show how many real multiplications are reduced to 
calculate all nodes at layers u,u + l belonging to one grandparent node at layer m + 2, where u is an odd 
number. Let us define z/ G [0, as the number of non-pruned branches from the grandparent node, after 
calculating the node weights co'(x'^"+^)) and comparing them with p^. If = 0, which means all branches 
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from the grandparent node are pruned, the modified algorithm does not reduce computations from the 
original DFS algorithm. If v > 0, to get all of the weights at the layer u and n + 1 under the grandparent 
node, the number of real multiplications reduces further from (z/ + 1)1^2 1 to 2|i7|. 

C. Initial Radius for BICMB-CP 

The proposed SD algorithm for UMB-CP described in the previous sections can also be applied to 
BICMB-CP. The P-dimensional complex-valued input-output relation of the precoded part in ^ can 
be transformed into a 2P-dimensional real-valued problem, based on the lattice representation in (|38l) . 
Applying the QR decomposition to the 2P x 2P dimensional matrix G in ([38]) , the bit metrics of the 
precoded part in ^ are rewritten as 

7'''(rfc,Cfc') = min ||ffc - Rxf (40) 

where is the product of and the transformed vector from r^. Due to the transformation, the position 
of Cfc' in the label of x needs to be acquired and stored in a new table k' (k, l,i), which means Ck' lies 
in the i^'^ bit position of label for the Z*'^ element of real-valued symbol vector x. Let fi^ denote a subset 
of fl whose labels have b E {0, 1} in the i*^ bit position. If we define f^'* as 

ib' = ■■ ^s\s=i e f^i, and x^^^^i e n} 

then, $f, denotes a subset of whose elements satisfy ||ffc — Rxp < pf. 

Similarly to UMB-CP, the SD algorithm for BICMB-CP now can be viewed as a pruning algorithm on 
a tree of depth 2P. However, its branches of the layer u = I correspond to elements drawn only from the 
set Xcj., C X- To determine the initial radius for BICMB-CP, we use the ZF-DFE algorithm to acquire an 
estimated real-valued vector symbol x^. for 6 = or 1, whose m*^ element x^,^ is detected successively 
from xl 2P to xl ^ as 

2P 

Xfc,, = argmin |ffc,„ - ^ Ru^vxl^^ - Ru,ux\ (41) 

V=U+1 

for the element corresponding to / indicated by the table k' {k,i,i), and 

2P 

xl^ = argmin \fk,u - Ru,vxl „ - Ru,ux\ (42) 

v=u+l 
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for the rest of the elements. Then, the initial radius is calculated by 

pl = \\h-Il4\\'. (43) 

With the initial radius acquired by the ZF-DFE algorithm, the SD guarantees no unsuccessful search for 
both of the bit metrics. 

VI. Simulation Results 

A. UMB-CP 

To illustrate the analysis of the diversity order in Section Unl we now present simulation results over a 
number of different system configurations. Fig. [3] shows BER performance for SB and FPMB. The curves 
with the legend FPMB are generated by the precoding matrices that outperform the others in [|25ll , [|26ll . 
All of the FPMB systems employ 4-QAM modulation, and the system data rate for SB and FPMB is set 
to 4, 8 bits/channel use for a 2 x 2 and a 4 x 4 system, respectively. All of the FPMB systems are shown 
to achieve the full diversity order since each slope is parallel to the corresponding SB system, known to 
achieve the full diversity order of NM. 

Simulation results to support the diversity analysis of 4 x 4 S* = 4 PPMB in Table H] are provided in 
Fig. m We find that the simulation results are the same as the diversity orders in Table H 

To verify the reduced computational complexity with sphere detection in Section |Vl we simulated 
2x25 = 2 and 4 x 4 5* = 4 FPMB systems using 4-QAM and 64-QAM with receivers employing the 
exhaustive search (EXH), the conventional SD (CSD), and the proposed SD (PSD). In these simulations, 
the initial radius is chosen to be = 2NqP, inside which at least one lattice point lies with a high 
probability [fTSl . The average number of real multiplications for decoding one transmitted vector symbol 
is calculated at different SNR. Since the reductions in complexity are substantial, we will express them 
as orders of magnitude (in approximate terms) in the sequel. Fig. |5] shows a comparison for the 2x2 
S = 2 FPMB system. For 4-QAM, a comparison with EXH shows that CSD reduces the number of 
multiplications by approximately 0.6 and 0.8 orders of magnitude at low and high SNR, respectively, 
and PSD reduces by approximately 1.0 and 1.1 order of magnitude at low and high SNR, respectively. 
As seen from the case of 64-QAM in Fig. [5J the reduction in complexity increases as the constellation 
size increases: the number of multiplications of CSD decreases by approximately 1.4 orders of magnitude 
at low SNR, and 2.8 at high SNR, while that of PSD decreases by 2.4 and 3.2 orders of magnitude at 



20 

low and high SNR, respectively. Fig. [6] shows the simulation results of 4 x 4 5 = 4 FPMB system. For 
4-QAM, the number of multiplications of CSD is reduced by 1.4 and 2.1 orders of magnitude at low and 
high SNR, respectively. PSD reduces the complexity by 2.1 orders of magnitude at low SNR, and 2.4 at 
high SNR. As already observed in Fig. |5J the reduction becomes larger as the constellation size increases 
in the 4 x 4 S" = 4 FPMB system. For 64-QAM, the number of multiplications of CSD decreases by 
3.3 and 6.4 orders of magnitude at low and high SNR, respectively. PSD gives a larger reduction by 4.3 
orders of magnitude at low SNR, and 7.0 at high SNR. Simulation results clearly show that CSD reduces 
the complexity substantially compared with EXH, and the complexity can be further reduced effectively 
by our PSD. The complexity reduction becomes larger as the constellation precoder dimension or the 
constellation size becomes larger. 

B. BICMB-CP 

To verify the diversity analysis in Section |Wl Fig. |7] depicts the simulation results for 2 x 2, 3 x 3, 
and 4x4 BICMB and BICMB-FP with 64-state convolutional code punctured from 1/2-rate mother code 
with generator polynomials (133, 171) in octal representation. In [8J, we showed the maximum achievable 
diversity order of BICMB with an i^^-rate convolutional code is {N -\S ■ Rc] + 1)(M - \S ■ Rc] + 1). In 
this example, the maximum achievable diversity order of the three BICMB systems is 1. However, Fig. 
|7] shows that BICMB-FP achieves the full diversity order for any code rate. 

Fig. [8] depicts the simulation results of BICMB-PP given in the example of Section ITlI-BI The diversity 
orders of the BICMB systems, Ti and 72 are 4 and 1, respectively jSl. Comparing the slopes of BICMB-PP 
with BICMB, we see that the simulation results match the analysis in Section IIII-BI 

To verify the proposed sphere decoding technique in this case for BICMB-FP, we simulated 2x2 
S = 2, 64-state Rc = 2/3 BICMB-FP systems, and 4 x 4 5 = 4, 64-state i?c = 4/5 BICMB-FP systems 
using 4-QAM and 64-QAM modulation with Gray mapping. The average number of real multiplications 
for acquiring one bit metric is calculated with receivers employing EXH, CSD, and PSD. Initial radii for 
both of CSD and PSD are determined by the ZF-DFE algorithm. In Fig. |9l we observe that the number 
of multiplications of CSD for 4-QAM is reduced by 0.4 and 0.5 orders of magnitude at low and high 
SNR, respectively. PSD yields bigger reductions by 1.0 and 1.1 orders of magnitude at low and high 
SNR, respectively. In the case of 64-QAM, reductions between CSD and EXH are 1.5 and 2.1 orders 
of magnitude at low and high SNR, respectively, while larger reductions of 2.4 and 2.9 are achieved by 
PSD. Fig. [To] shows the number of multiplications of CSD for 4-QAM decreases by 1.3 and 1.5 orders 
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of magnitude at low and high SNR, respectively. PSD gives bigger reductions by 2.1 orders of magnitude 
at low SNR, and 2.3 at high SNR. For the 64-QAM case, reductions between EXH and CSD by 3.2 and 
4.4 orders of magnitude are observed at low and high SNR, respectively, while larger reductions by 4.2 
and 5.4 are achieved by PSD. Similar to the uncoded case, the complexity reduction becomes larger as 
the constellation precoder dimension or the constellation size becomes larger. One important property of 
our decoding technique needs to be emphasized: the substantial complexity reduction achieved causes no 
performance degradation. 

VII. Conclusion 

In this paper, we proposed constellation precoded multiple beamforming which achieves the full diversity 
order in both of the uncoded and coded MIMO multiple beamforming systems when the channel informa- 
tion is perfectly available at the transmitter as well as the receiver, at different levels of spatial multiplexing, 
including the maximum (min(A^, M)) provided by the NxM channel. Diversity analysis was given in both 
of the multiple beamforming schemes through the calculation of pairwise error probability. We provided 
examples of calculating the diversity orders of various multiple beamforming systems and simulation 
results supporting the analysis. A sphere detection algorithm which improves the complexity was proposed 
so that constellation precoded multiple beamforming can be considered as a practical implementation for 
MIMO systems requiring high throughput with the full diversity order. The proposed SD algorithm in 
this paper can be applied to any MIMO system. 
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(a) Uncoded Multiple Beamforming with Constellation Preceding. 
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(b) Bit-Interleaved Coded Multiple Beamforming with Constellation Precoding. 
Fig. 1. Structure of Constellation Preceded Multiple Beamforming. 



TABLE 1 

Diversity order {Omv) of 4 x 4, 5 = 4 partially precoded multiple beamforming system 
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Fig. 2. Tree structure for a 2 x 2 FPMB system employing 4-QAM. 
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Fig. 3. BER vs. SNR comparison for 2 x 2, 4 x 4 SB and FPMB. 
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Fig. 5. Average number of real multiplications vs. SNR for the 2 x 2 FPMB systems with 4-QAM and 64-QAM. 
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Fig. 6. Average number of real multiplications vs. SNR for the 4 x 4 FPMB systems with 4-QAM and 64-QAM. 
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Fig. 7. BER comparison between BICMB and BICMB-FP with 16-QAM, and 64-state punctured convolutional code. 
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Fig. 8. BER vs. SNR for BICMB-PP with 3x35 = 3, 4-QAM, and 4-state 1/2-rate convolutional code. 
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Fig. 9. Average number of real multiplications vs. SNR for the 2 x 2 BICMB-FP systems with 4-QAM and 64-QAM. 
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Fig. 10. Average number of real multiplications vs. SNR for the 4 x 4 BICMB-FP systems with 4-QAM and 64-QAM. 



